-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-854] SVRG Optimization in Python Module API #12376
[MXNET-854] SVRG Optimization in Python Module API #12376
Conversation
Thanks @StephanieYuan - Welcome to MXNet community! @piiswrong @eric-haibin-lin @anirudhacharya @Roshrini @vandanavk - You will be interested to have a look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few high level comments for now. Will dive deep by building this change and playing around with MNIST.
Thanks for doing the benchmarks and adding the results. Looks promising and very useful.
It will be very helpful, if we can benchmark on MNIST as depicted in the paper, and see if we can achieve the results in paper, to verify correctness of the implementation.
Examples | ||
-------- | ||
>>> # An example of declaring and using SVRGModule. | ||
>>> mod = mod = SVRGModule(symbol=lro, data_names=['data'], label_names=['lin_reg_label'], update_freq=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct this example.
if isinstance(update_freq, int): | ||
self.update_freq = update_freq | ||
else: | ||
raise TypeError("update_freq must be an integer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make error more useful for user to understand the issue. Example: (update_freq in SVRGModule must be an integer. Example: 2. Given update_freq = ', update_freq)
"""Internal function to reset binded state.""" | ||
super(SVRGModule, self)._reset_bind() | ||
self._mod_aux._reset_bind() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra spaces?
self.logger.info('Epoch[%d] Train-%s=%f', epoch, name, val) | ||
toc = time.time() | ||
self.logger.info('Epoch[%d] Time cost=%.3f', epoch, (toc - tic)) | ||
print('Epoch[%d] Time cost=%.3f', epoch, eval_metric.get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use logger instead of print every where. example self.logger.info
|
||
@mx.optimizer.register | ||
class SVRGOptimizer(mx.optimizer.Optimizer): | ||
"""SVRGOptimizer is a wrapper class for two optimizers: one for accumulating full gradients and the other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly useful with SGD right? Why allow any optimizer to be passed in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the previous benchmarks SVRG is used with SGD. Meanwhile I did some exploration of using NAG + SVRG and Adam + SVRG. I think it will be valuable to benchmark those optimizers with SVRG too.
@@ -0,0 +1,84 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests should go tests. May be like below?
- tests/python/train/test_contrib_svrg.py : Full end to end train and testing. See test_mlp.py as an example. Please, note, it should be faster (< 2 min at max?). If it is longer than that, we need to move it to nightly test.
- tests/python/unittest/test_contrib_svrgmodule.py / ..._svrgoptimizer.py etc..
mod.forward_backward(data_batch=batch) | ||
mod.update() | ||
mod.update_metric(metrics, batch.label) | ||
print('Epoch[%d] Time cost=%.3f', e, metrics.get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use logging.
:return: an instance of mx.io.NDArrayIter | ||
:return: an instance of mx.mod.svrgmodule for performing SVRG optimization | ||
""" | ||
mx.random.seed(42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please do not use fixed seed. You can see other tests under tests/
- Also, what are we asserting here?
return di, mod | ||
|
||
# run as a script | ||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be? Please see other tests.
if __name__ == '__main__':
import nose
nose.runmodule()
fc59615
to
9ecc673
Compare
Moved svrg_optimization to python/contrib package. |
495be3b
to
a107e3b
Compare
@StephanieYuan The original version of SVRG could be time-consuming due to the computation of the full gradient at the beginning of each epoch. Could you also include the cheap version in your implementation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, please add documentation for the input params for functions and a description of return variables wherever missing.
Please execute pylint on these files using ci/other/pylintrc in incubator-mxnet folder.
Something like this
pylint --rcfile=ci/other/pylintrc --ignore-patterns="..so$$,..dll$$,..dylib$$" python/mxnet/contrib/svrg_optimization/.py example/svrg_module/.py tests/python/unittest/test_contrib_svrg
example/svrg_module/train.py
Outdated
parser.add_argument('-b', dest="batch_size", help="define the batch size for training", type=int, | ||
default=100, required=False) | ||
parser.add_argument('-m', dest='metrics', help="create eval metric", type=str, required=False) | ||
parser.add_argument('--gpus', type=str, help='list of gpus to run, e.g. 0 or 0,2,5. empty means using cpu') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is mx.cpu() as defined in line 35.
a107e3b
to
b712796
Compare
@xcgoner Thank you for the pointer! I'd like to try out the improvements in the later iterations of this project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the API docs here - module.md and optimization.md with references to the SVRG module.
|
||
@mx.optimizer.register | ||
class SVRGOptimizer(mx.optimizer.Optimizer): | ||
"""SVRGOptimizer is a wrapper class for two optimizers: self.aux_opt for accumulating full gradients and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write the doc string in the following format to maintain uniformity - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L445-L488
import numpy as np | ||
|
||
|
||
def set_up(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SVRGModule lacks test coverage. Can you add tests for module.save_checkpoint
, module.load
, module.forward
, module.predict
.
## SVRG Optimization in Python Module | ||
|
||
### Problem Description | ||
SVRG stands for Stochastic Variance Reduced Gradient, which was first introduced in the paper _Accelerating Stochastic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This README should be removed from here and the SVRG description and characteristics should go into the documentation of the SVRG optimizer.
* test_svrg_module.py: unittests for SVRGModule API | ||
* test_svrg_optimizer.py: unittests for SVRGOptimizer API | ||
|
||
### Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add these in the README of the examples folder.
* examples/svrg_module/example_api_train.py: a demo script for training a SVRGModule using intermediate level api and high level api | ||
* examples/svrg_module/example_inference.py: a demo script for SVRGModule inference | ||
|
||
#### Benchmarks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the benchmark graphs are present in the "examples" folder, move these lines to the README in the examples folder.
|
||
#### Testing: | ||
|
||
Unit Tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these lines about the Unit Tests.
data_reader.py to download the data. | ||
|
||
This example trains a linear regression model using SVRGModule. Logs of the training results can be found in | ||
experiments.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is experiments.log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be generated when runs the train script.
2f7b40d
to
4a2b644
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run the following command to get the coverage results nosetests -v --with-coverage tests/python/unittest/test_contrib_svrg_*
mxnet/contrib/svrg_optimization/__init__.py 3 0 100%
mxnet/contrib/svrg_optimization/svrg_module.py 175 92 47%
mxnet/contrib/svrg_optimization/svrg_optimizer.py 38 1 97%
If you notice svrg_module.py
coverage is deficient. Please increase the coverage for that file, for example update_full_grads
method is not tested and you have overriden the fit
method of the base class Module
, please add tests for them.
Also I am still not sure if the organization of the documentation for this new module is right. Marking @aaronmarkham to review the doc files.
tests/python/unittest/test_module.py
Outdated
@@ -168,6 +168,7 @@ def test_module_layout(): | |||
assert mod.get_outputs()[0].shape == dshape | |||
|
|||
hdshape = (3, 4, 7) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without trying it out, I just have a couple of suggestions for updates.
To be clear, at a minimum, you need to update api/python/index.md and add this module in the contrib section.
example/svrg_module/README.md
Outdated
@@ -0,0 +1,26 @@ | |||
## SVRG Module Example | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce what it is as well. From your other document:
"SVRGModule is an extension to the Module API that implements SVRG (Stochastic Variance Reduced Gradients) optimization ..."
example/svrg_module/README.md
Outdated
## SVRG Module Example | ||
|
||
#### API Usage Example | ||
SVRGModule provides both high-level and intermediate-level APIs while minimizing the changes with Module API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link back to the docs. This should link back to this spot - where you should add an entry for this module.
https://github.com/apache/incubator-mxnet/blob/master/docs/api/python/index.md#contrib-package
Since this comes from the example folder and outside of the docs, use a full static link.
@@ -1,188 +0,0 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete this file and legacy_ndarray.v0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done by mistake, I'll add them back when I push for more svrg_module test coverage. Thank you for the heads up!
@StephanieYuan Also the way From your design document I understand that SVRGOptimizer cannot be used with Module API which is why you wrote the SVRGModule API which will use SVRGOptimizer under the hood instead of the usual SGD or other optimizers. Can you explain why in greater detail? Also if this is indeed the case, then the SVRGOptimizer class needs to be declared private using a leading underscore in its name, else it will be very misleading to the user who might try to use SVRGOptimizer with the plain vanilla Module API. Also it should be made clear in the doc string for the SRVGOptimizer that this optimizer is to be used with the SVRGModule alone. Also you might want to consider moving the SVRGOptimizer to the |
fa1753c
to
bb31363
Compare
Great progress here. Thanks everyone.
@anirudhacharya - I would still prefer keeping SVRG optimizer in contrib and allow it to evolve. As mentioned in above comments and design doc, Other combination like Adam+SVRG could also be used but not benchmarked. I would set the default optimizer for SVRGModule to be SVRGOptimizer but not remove the option altogether and make SVRGOptimizer internal. Thoughts? |
4c48130
to
c446dc0
Compare
Thanks everyone for the feedback and input! I've rebased and resolved the conflict with master. I have also added some more unittests for SVRGModule especially on verifying SVRG gradients calculations and the test coverage is now: @sandeep-krishnamurthy I'm working on creating the MNIST example but I'm expecting it will take a few more days as I have other tasks in queue. Maybe I can include the example in my next PR on SVRG in distributed mode? @anirudhacharya SVRGOptimizer is designed to be used with SVRGModule alone. The actual SVRG optimization logic is implemented in the SVRGModule and the SVRGOptimizer is designed to assist the full gradients accumulation in the KVStore. I've made SVRGOptimizer a private class and added the reasonings behind the design in greater detail in the docs/api/python/contrib/svrg_optimization. Hopefully it clarifies things a bit. @aaronmarkham I've added pointer about the svrg_optimization in index.md and refined the docs in the example/svrg_module/README. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left 2 minor comments, please fix. Thank you for your contributions.
This PR is ready to go after approval from @aaronmarkham for documentation and @anirudhacharya for his code review comments.
mod.init_params() | ||
mod.init_optimizer(optimizer='sgd', optimizer_params={'learning_rate': 0.1}) | ||
mod.update() | ||
mod.save_checkpoint('test', 0, save_optimizer_states=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommended to use tempfile. This avoid race condition in file access from other tests and also clean up is easy. Something like:
import tempfile
import os
....
tmp = tempfile.mkdtemp()
tmpfile = os.path.join(tmp, 'svrg_test')
mod.save_checkpoint(tmpfile, .....)
tests/python/unittest/test_module.py
Outdated
@@ -815,4 +815,4 @@ def test_forward_types(): | |||
|
|||
if __name__ == '__main__': | |||
import nose | |||
nose.runmodule() | |||
nose.runmodule() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a new line (pep)
@sandeep-krishnamurthy it now looks better after SVRGOptimizer was made private. Including SVRGOptimizer in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the documentation for this module renders right now - http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-12376/17/api/python/contrib/svrg_optimization.html
Private members like _SVRGOptimizer should not show up in the documentation
The description for the SVRGModule is not rendering. cc: @aaronmarkham
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please address the comments on svrg_optimization.md
file
@anirudhacharya I'll fix it soon. Is there a way that I can see the rendered view before it gets pushed? |
@StephanieYuan some of the issues are with the content of the docs and few are with how the content renders. @aaronmarkham might be able to help with how the docs renders. You will probably have to build the docs and have to deploy it locally. |
c6842a0
to
d6e33f1
Compare
I just updated the how-to guide for building the docs. |
I'm not sure why those private members are showing up. Sphinx should recognize them as such if they're defined that way. Be sure to adjust settings.ini when testing the docs build, otherwise you'll be waiting forever each time while scala builds. |
@aaronmarkham Thanks for the pointer! |
dfddae1
to
3887fe9
Compare
e39bf63
to
49220e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the examples folder could you please add the code you used to generate the benchmarking graphs for the linear regression example. Since you are including the graphs in the examples folder, you should also have the code that can reproduce those results.
@anirudhacharya Sounds good! It's a Jupyter Notebook. I'll add it to the example folder. |
49220e0
to
4028bdf
Compare
…c. This version supports single machine SVRG with single cpu, gpu and multi-gpus.
4028bdf
to
cd54f73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for awesome work.
Having docs, example jupyter notebook, good test coverage is super useful.
…he#12376) Implemented a python SVRGModule for performing SVRG Optimization Logic. This version supports single machine SVRG with single cpu, gpu and multi-gpus. (apache#12376)
…he#12376) Implemented a python SVRGModule for performing SVRG Optimization Logic. This version supports single machine SVRG with single cpu, gpu and multi-gpus. (apache#12376)
Description
SVRG stands for Stochastic Variance Reduced Gradient, which is an optimization technique that complements SGD. It is introduced in the paper Accelerating Stochastic Gradient Descent using Predicative Variance Reduction in 2013.
Key Characteristics of SVRG:
SVRG Module should:
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments